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Abstract. We present a detailed study of the mechanism by which the INVERT 
method [Phys. Rev. Lett. 104, 125501] guides structure refinement of disordered 
materials. We present a number of different possible implementations of the 
central algorithm and explore the question of algorithm weighting. Our analysis 
includes quantification of the relative contributions of variance and fit-to-data 
terms during structure refinement, which leads us to study the roles of density 
fluctuations and configurational jamming in the RMC fitting process. We present 
a parametric study of the pair distribution function solution space for CgOi i- 
Si and a-Si02, which serves to highlight the difficulties faced in developing a 
transferable weighting scheme. 
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1. Introduction 

The absence of a generic methodology for determining the atomic-scale structure of 
disordered materials remains one of the key problems in contemporary structural 
science. Motivating the search for a solution is the desire to understand 
structure/property relationships in situations where disordered materials play a 
central role. Examples include biomineralisation processes [T], pharmaceutical 
polymorphism and stability [H [31 H], data storage in phase-change chalcogenides 
[3 ISl [Ij : and pressure- and temperature-induced amorphisation of oxide and metal- 
organic frameworks [H UHl [HI [HI \IM ■ these materials lack long-range periodic 
order, established crystallographic techniques cannot be used to produce structural 
models. Instead, techniques that are sensitive to the existence and nature of 
short-range structural correlations represent the only possibilities for experiment- 
driven structure determination. It is for this reason that spectroscopic techniques 
such as NMR and EXAFS, together with the diffraction method of total scattering 
(often termed "pair distribution function", or PDF, measurements), have collectively 
played an important role in developing our understanding of disordered materials 
[Tl[Til[Tll[T7]. 

Of these various experimental techniques, PDF measurements arguably provide 
the most direct probe of local structural order [TH]. The PDF itself represents a 
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histogram of interatomic separations, weighted by the relative concentration and 
scattering power of the different atom types present. So, even by using simple peak 
fitting methods it is possible to extract directly from the PDF the values of nearest- 
and next-nearest neighbour bond lengths and the corresponding coordination numbers. 
Coupled with chemical intuition, these values can often be used to infer longer- 
range and/or higher-order structural correlations, such as bond angles and network 
connectivity. But because the PDF is so straightforward to calculate from any given 
structural model, genuine structural refinement is now computationally tractable in 
a way that is not yet possible for many types of spectroscopic measurements {e.g. 
NMR) . Perhaps the most widely used refinement approach is that implemented in the 
software PDFGui [19]. The general strategy employed by PDFGui bases refinement 
on the interatomic correlations present in a relevant periodic structure. The unit cell 
dimensions, atom coordinates and thermal displacement parameters are all refined 
against the PDF using a Rietveld-like algorithm. The absence of long-range order is 
then treated by some combination of restricting the fit to only the lowest-r region and 
incorporating a damping term. This damping term contains useful information when 
it can be interpreted in terms of a characteristic coherence length-scale (which may 
itself have orientational dependence), reflecting e.g. nanoparticle or domain size [20| . 

But but both peak fitting and real-space Rietveld approaches have two serious 
shortcomings that in principle limit the scientific value of the structural information 
extracted from the PDF. The first is that neither actually produces a structural 
model in the form required for ab initio electronic structure calculations or for 
molecular dynamics simulations. If the science of interest arises from understanding 
the relationship between structure and property in a material, this limitation can prove 
serious. Second, disordered materials can — and do — have different structures to their 
crystalline analogues [H]. Any approach that relies on using a periodic structure as 
its central reference inherently prohibits the exploration of disordered systems whose 
structures cannot be understood as nanocrystalline arrays. 

The established alternative is to use an atomistic approach [52]. This involves 
calculating the PDF from a configuration of atoms that (i) is larger than the coherence 
length evident in the PDF, (ii) is subject to periodic boundary conditions, and 
(iii) is assembled using a composition and density appropriate for the material in 
question. The corresponding PDF is again straightforwardly calculated using the 
atomic coordinates but without need to apply any post hoc damping correction. 
Refinement involves varying the atomic coordinates in this configuration until the 
best possible fit to data is achieved, often making use of a reverse Monte Carlo 
(RMC) algorithm in order to explore the large configurational space associated with 
perhaps thousands of positional parameters. In contrast to "traditional" structure 
refinements, there is no expectation that the PDF data are described by a unique 
set of atomic coordinates. Instead it is the set of general reproducible features {e.g. 
topological connectivity, bond-length, bond-angle and torsional distribution functions) 
that is interpreted as the structure solution in this case. Ideally, the solution obtained 
should not depend on the starting configuration; nor should it be necessary to assume 
structural features {e.g. coordination numbers and geometries) during refinement. 

Arguably the most severe shortcoming of atomistic approaches such as RMC 
is the fact that meaningfully-different configurations can give rise to PDF fits of 
indistinguishable quality [53]. Moreover, it is usually the case that the physically- 
sensible solutions are configurationally less accessible than are the vastly more 
numerous nonsensical solutions — so not only are incorrect solutions possible but they 
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Figure 1. Schematic representation of the PDF refinement landscape, (a) 
Physically unreasonable (mostly red; incorrect local coordination) and sensible 
(blue; correct local coordination) configurations give rise to equally deep wells 
in configuration space, but the former are often more accessible, (b) The goal 
of nanostructure determination algorithms is to reshape solution space so as 
to increase the relative depth and breadth of those minima associated with 
meaningful solutions. 

are in fact more likely. In terms of the configurational landscape involved, we are 
describing a situation where the minima associated with meaningful structural 
solutions are few and steep, and where there are many equally-deep but shallower 
minima associated with unphysical solutions [Fig. [ija)]. This observation frames the 
crucial challenge for nanostructure determination: namely, how does one redefine 
the energy landscape in a generic and transferable manner so as to ensure that the 
deepest minima always correspond to meaningful solutions and that these minima are 
configurationally accessible [Fig. [l]^b)]? 

In a previous paper, we introduced the idea that experimental constraints on 
the number of unique environments might be used to guide structure refinement in 
a sensible way [23]. This approach — dubbed INVERT {— IN Variant Environment 
Refinement Technique) — developed from a realisation that the single most obvious flaw 
in RMC configurations of canonical disordered systems such as amorphous silicon was 
their incorporation of an unphysical number of different coordination environments. 
Whereas the traditional remedy has been to enforce a priori assumptions concerning 
the final structural model {e.g. coordination numbers and geometries), we posited that 
the number and distribution of unique environments determined using spectroscopic 
measurements could be used as a constraint without needing to assume anything 
further about the nature of the environments themselves. For a handful of simple 
systems — Cgo, S12, a-Si and a-Si02 — we demonstrated that a RMC+INVERT 
approach was markedly more effective than native RMC methods at arriving at 
sensible structure solutions. 

In this paper, we explore in greater detail the mechanism by which INVERT 
actually guides nanostructure refinement, with a view to establishing how best the 
approach might be developed and applied in future studies. We begin by describing 
some of the different possible implementations of the central algorithm and then 
proceed to explore the question of algorithm weighting: namely, how does one balance 
the variance term with respect to the fit-to-data, and how is this balance affected 
by the presence of more than one type of chemical environment? In addressing 
this question, we quantify the relative contributions of variance and fit-to-data terms 
during structure refinement, and this leads us to study the roles of density fluctuations 
and configurational jamming in the RMC fitting process. We present a parametric 
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study of PDF solution space for Cgo, a-Si and a-Si02, which serves to highhght the 
difficulties faced in developing a transferable weighting scheme. Our paper concludes 
with a discussion of the major challenges and opportunities associated with developing 
further the INVERT approach in future studies. 



2. Implementation 

At its heart, the INVERT concept involves minimising the variance amongst the 
individual atomic PDFs for atoms in equivalent environments. Representing the 
individual PDFs by the term gi{r), one has for a single-environment system: 

Var[5(r)] = l^[g,(r)-(5(r))]^ (1) 

i 

^{9Arf)~{g{r)?. (2) 

In the limit of zero variance, all of the gi{r) are equal to the same average {g(r)), and 
hence if this average PDF is to represent a good fit to the experimental PDF Gcxpt (r) 
then we essentially require that each gi (r) also match Gcxpt (r) ■ This argument suggests 
a natural implementation of the INVERT approach in the case of a single environment, 
where the configuration quality is measured by the similarity of each atomic PDF to 
the experimental function: 

XiNVERT = y2\^^^9i{r)- Gcxpt (r)]^ } ■ (3) 



r i 



We note at this point that the definition of the g{r) and G(r) is important, because the 
differences amongst gi{r) have a dependence for glassy systems [25]; in our work 
we use the definitions of Ref. [26], which enable direct comparison as suggested above. 
Recognising then that standard RMC algorithm involves minimising the function 

xl,Mc-EK5W)-^-ptW]'' (4) 

r 

it is straightforward to show that Eq. ([s]) reduces to 

XiNVERT = Xrmc + E Var[5(?-)]- (5) 



An extension to multiple environments also follows, where we assume initially that 
the experimental PDF can be decomposed into its constituent partial PDFs Gc^pt(r): 

Gcxpt(r-) = ^ CaCpbahfiG^^^^^ir). (6) 

Here a and /? index the different environments (or atom types), and the c and h are the 
corresponding relative concentrations and scattering strengths of those environments. 
The standard RMC penalty is 

X^MC = E I H^^ccpb^h? - G:l,{r)\ (7) 

a,l3 



and hence 



XiNVERT 
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Figure 2. Three possible mettiods for fitting tlie PDF Goxpt (black curve): (a) 
using the configurational average {g{r)), which is continuous in r. (b) using an 
individual atomic PDF gi{r), which gives rise to a discrete and unhelpful difference 
function, (c) using a distance list, which gives a discrete but meaningful difference 
function well suited to atomistic refinement. In each case the fit is shown in blue, 
and the difference function in red. 

as above. What this suggests is that the [caCphab^f' provide the natural weightings for 
the variance in each partial atomic PDF g^^ (r) . We note that this weighting strategy 
can be applied even if the individual Gj^fpj (r) are not experimentally separable. 

In deriving the above equations, the implicit assumption is made that the gi{r) 
are directly comparable in a meaningful way. In a real material the functions 
gi{r) represent a time integral, such that each function could indeed resemble the 
time and configurational average {g{r)). The same is not true of course for static 
atomistic configurations such as used for RMC refinements: the calculated gi{r) 
consist of a series of delta functions that can be taken to represent an instantaneous 
gi{r,t). Ergodicity allows comparison of the configurational average {g{r)) with the 
experimental G^xnt (f) i but the same comparison is not meaningful for the gi (r, t) 
themselves [Fig. [2] . 

One possible approach to resolving this problem is to convolve the gi{r, t) with a 
broadening function that represents the effect of thermal motion. For real materials, 
this motion is correlated and so the corresponding r-dependence would also need to be 
taken into account [171[5S]. Indeed, the sensitivity of the PDF to some directionality 
in the phonon dispersion also implies that additional orientation considerations may 
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be required j29l I30j . The only unambiguous method of defining such a function is 
via direct calculation from a suitable lattice dynamical model [3T] — a tractable, if 
unattractive, solution. However the broadening function may be defined, the key 
disadvantage of this approach is the computational cost of the convolution operation, 
which must be carried out for each atomic PDF at each step in the refinement. 

An alternative approach — and the one we have mostly adopted — is to reformulate 
the PDF in terms of a distance list. This function is a discretised version of the PDF, 
which takes integral arguments: dcxpt('^) can be defined as the distance r„ for which 

47rr2pGexptWdr = n, (9) 





where p is the number density. We note that a similar approach has been used 
in the Liga algorithm described elsewhere [32l [33]. The corresponding INVERT 
implementation can take one of two forms. Both involve calculating for each atom i 
of type a the set of distances df^{n) to successive shells n of neighbours of type /3. 
The first possibility is to add to the standard Xrmc function an additional variance 
term of the form 

- E E { J^j^ E [dfin) - id-' in))] ' } . (10) 

Here the Aajs represents a suitable weighting for the various partial PDFs (such as 
discussed above), and the additional {d°'^{n))~'^ weighting is included to account for 
the fact that the number of neighbours grows as [Fig. [2]jc)]. This is the approach 
we have taken when fitting to o-Si configurations. The second possibility is to fit the 
di{n) directly to the dexpt(«') extracted from G'oxpt(?'): 

XmvERT = E \ (^cxpt(n))"^ E ^dt{n) - 4xpt(n)]^ \ ■ (11) 

This is the general approach we have taken for small molecular clusters (Cgo and S12), 
where the experimental Goxpt {f) function is itself highly discrete and hence ineffective 
at guiding refinement. 

A third possibility for comparing discrete individual PDFs to a time- or 
configurationally-averaged experimental PDF is to calculate the cumulative PDFs 

9[{r) =- f 9.{r')Ar' (12) 
G;xpt(0 = ^ /'Gexpt(/)dr'. (13) 



r 







We have found this to be particularly useful for disordered network systems where the 
configuration box has relatively few atoms: the fluctuations that impede direct fltting 
to Gf.y^pt{r) are now smoothed by the integral. In this paper, we apply this cumulative 
integral approach to the structure solution of a-Si02. 

3. Fitting process 

While the formulations given in the preceding section suggest some natural weightings 
for the fit-to-data and variance contributions, we were interested to understand 
better the interplay between these two terms during the RMC fitting process. Here 
we are essentially asking two questions. First, to what extent does each term actually 
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Figure 3. Evolution of fit-to-data and variance contributions to during RMC 
refinement of (a) Cgo, (b) a-Si and (c) a-Si02. 



drive the refinement? And, second, do the two terms behave similarly for different 
systems: molecules vs networks; systems with a single atom environments vs those 
with multiple-atom environments? 

Considering first the "nanostructured" cluster Cgo, our starting point is a large 
box containing 60 randomly-distributed C atoms. Because the system is molecular, we 
do not make use of periodic boundary conditions. The experimental PDF data used 
in our fit consist only of that portion that corresponds to intermolecular correlations 
{i.e. r < 7 A) and we apply the distance-list implementation of the INVERT 



INVERT nanostructure determination 



8 



variance function as described by Eq. (11). The variation in components as a 
function of number of accepted moves is shown in Fig. |3][a) . What is clear is that the 
two components behave differently throughout the refinement process. The largest 
initial changes are to be found in the Xvar term; it is only once this has reached 
equilibrium that substantive reductions in Xrmc observed. Examination of 

the configuration at different points in the refinement suggests that the initial decrease 
in Xvar corresponds to the formation of a single contiguous cluster of approximately 
spherical shape. The solution of the actual chemical structure of Cgo appears to be 
almost wholly driven by fitting to the PDF from this point onwards. Consequently, 
the failure of native RMC approaches to solve the structure of Cgo from PDF data 
[24] may largely reflect the configurational difficulty of clustering atoms from their 
initially-random positions in the RMC box. 

The corresponding behaviour for the disordered network solids a-Si and a-Si02 
is illustrated in Fig. [3][b,c). In contrast to the trends observed for Ceo it seems that 
improvements to the fit-to-data and variance terms are more strongly coupled in these 
systems. If there is an obvious difference it is that the variance component in a-Si02 
shows a more gradual and sustained improvement throughout the refinement. This 
may reflect the more complex set of variance terms associated with a two-component 
system. In many ways the trend in Xoxpt remarkably similar for the two refinements: 
a period of rapid initial improvement in fit-to-data (up to ca 5000 moves) is followed 
by a sustained but noticeably more gradual improvement that persists throughout the 
remainder of the refinement. 

We proceed to demonstrate that for a-Si these two refinement regimes reflect a 
period of initial density redistribution followed by subsequent small reorganisation of 
an essentially-jammed configuration. Our starting point is to consider the evolution in 
density variance throughout the refinement. This variance was calculated as follows. 
The RMC box was subdivided into a set of 50 x 50 x 50 equally-sized voxels (ca 0.4 A 
each side). For each voxel, we counted the number of Si atoms contained within, 
and hence determined a voxel number density p{i,j, k), where i,j, k label the voxel in 
question. The density variance is then given by 

Var(p) = ^[p(i,j,/c)-po]', (14) 

where po is the total number density. The evolution of Var(/3) as a function of accepted 
move is shown in Fig.Qa), from which it is clear that the first 5000-move phase of the 
refinement involves a monotonic reduction in density variance to a level that remains 
essentially unchanged thereafter. This initial configurational reorganisation is the 
only phase of the refinement for which large atom moves are accepted; the variation 
in mean "velocity" {— atom displacement per move) as a function of accepted move 
also refiects this result [Fig. [4][b)]. 

That the configuration is essentially jammed after the first 5000 accepted moves 
is reflected in the histogram of trajectory lengths calculated for different regimes of the 
refinement [Fig.js]. Considering first the median displacement between initial and final 
atom positions, we find that most atoms have moved approximately 2 A throughout 
the total course of the refinement (145 000 accepted moves). The same histogram 
calculated for the final 140 000 moves shows a markedly lower median value of less 
than 1 A; whereas the corresponding histogram for the first 5000 moves is very similar 
to the total function, with a median displacement of 1.6 A. Summing in quadrature 
yields a total displacement of 1.9 A, indicating that there is very little correlation 
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Figure 4. Evolution of (a) density variance and (b) root mean squared 
displacement per accepted move (velocity, v) during RMC refinement of a-Si. 

between moves in the two phases. So the displacements after 5000 moves reflect small 
structural reorganisations without any significant changes to network topology. Our 
final comment with respect to this refinement concerns the number of accepted moves 
themselves. We find that the logarithm of the number of accepted moves is not a 
linear function of the logarithm of the number of generated moves, but rather is linear 
only over two separate regimes that correspond respectively to fewer than, and more 
than, 5000 accepted moves. 

So what arc the implications of the existence of these two regimes of RMC 
refinement? The obvious concern is that the system does not have the requisite 
flexibility to explore configurational space adequately. This means that the atom 
positions in the final configuration are likely to depend strongly on the fluctuations 
present in the initial starting configuration. To test this hypothesis, we set up a 
refinement for a-Si for which the initial configuration was based on the structure of 
crystalline silicon itself. We found that the system was unable to eradicate the bias 
of its initial periodicity, giving a fit-to-data that was approximately 10% worse than 
for refinements set up using random initial configurations. The flnal variance term 
is similar for both types of refinement (of course Xyar initially zero-valued for the 
crystalline starting configuration), so it is clear that refinement is actually jammed 
because the total value is higher than it might otherwise be. Perhaps the most 
interesting conclusion to be drawn here is that the behaviour of the variance term in the 
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Figure 5. Trajectory length distributions for the (a) total, (b) final, and (c) 
initial RMC refinement regimes of a-Si. 



early refinement region — wiiere it is acting to increase liomogeneity — is advantageous 
in the case of a molecular system, but may in fact contribute to the jamming problem 
for networks. 



4. Solution spaces 

We return now to the problem of optimising the relative weights of fit-to-data and and 
local variance terms. In our earlier study j24j . we used an ad hoc method to explore 
the parameter space created by the weightings of these terms. While it was the case 
that we obtained satisfactory refinements, we did not know whether the weightings we 
had chosen were the optimum values, or indeed how sensitive the solution space was to 
variation in those parameters. Sensitivity becomes an especially important factor in 
the instance that there is no single transferable weighting scheme. So the approach we 
take here is to explore the refinement "success" for Cgo, a-Si and a-Si02 as a function 
of data and variance weightings. Our interest is in the quality of the configuration, 
rather than the magnitude of and so we use as our metric the number of atoms with 
the correct coordination number. In the case of Cgo, configurations with 60 (or nearly 
60) threefold-coordinated C atoms universally correspond to good structural models 
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Figure 6. . Representations of the solution spaces for refinements of (a) CeOi 
(b) a-Si, and (c,d) a-Si02- Panel (a) shows the number of triply-coordinated 
C atoms in RMC configurations for Ceo after 10® proposed moves; panel (b) 
gives the number of fourfold coordinated Si atoms in configurations of a-Si (total 
512 atoms); panels (c) and (d) give the number of fourfold coordinated Si atoms 
in configurations of a-Si02 (total 64 Si atoms and 128 O atoms) with pariwise 
variance terms weighted equally in (c) but by the relevant bibjCiCj terms in (d). 



with the expected truncated icosahedral shape. For the network sohds, coordination 
number remains a useful metric ahhough we note that in itself it is blind to unphysical 
features such as the existence of triangular ring structures. 

Considering first the solution space we observe for Cgo, we find a remarkably well- 
defined plateau that extends across some 14 orders of magnitude of different weighting 
values [Fig. [6f^a)]. Convergence appears to be predicated primarily on the INVERT 
constraint: in particular, in its absence there appears to be no prospect of structure 
solution. The ability of the INVERT term to drive structure solution by itself is surely 
a result of the specific implementation we have used in this case: the formalism of 
Eq. (11) includes an implicit fit-to-data term. Indeed the only effect of weighting the 
data contribution more strongly is to reduce the likelihood of obtaining the correct 
solution. So it appears that in this instance the natural weighting of INVERT and 
fit-to-data is precisely that needed to obtain a reproducible and accurate structure 
solution. 

As suggested by their different refinement behaviour, the parameterisation plots 
we obtain for a-Si and a-Si02 reveal very different sensitivities to that observed 
for Ceo [Fig. |6][b,c)]. For both of the network solids, there appears to exist a 
well-defined optimum set of weights that is not obviously related between the two 
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systems. Moreover, the region of maximum coordination number exists only over 
a relatively narrow range of weightings, and — especially in the case of a-Si02 — it 
can also lie adjacent to other regions of very poor solution quality (note the rapid 
variation in final coordination number observed for different values of log(i(;pDF) at 
log(t«iNVERT) ^ —1)- Because we are making use of the INVERT implementation 
outlined in Eq. ( |10[ ), which does not contain any implicit fit-to-data component, we 
find for these network structures that sufficiently high weightings of Xinvert lead to 
unphysical structures. 

Because a-Si02 contains atoms that exist in two different local environments (Si 
and O), we used this system to establish the extent to which its solution space is 
affected by different weighting schemes for the various pairwise variance terms. We 
have explored two schemes in particular: the first makes use of identical weights 
for each of the four variance terms; the second applies a weighting based on the 
CiCjbibj terms outlined on page 5[ The corresponding solution space plots are given 
in Fig. [6]jc,d), where it is clear that their overall form is essentially unchanged by the 
variation in weighting scheme. Intriguingly, the variation with fit-to-data weighting 
is almost identical in the two approaches; instead the most obvious difference is the 
location of the global maximum along the INVERT weighting axis. The difference in 
optimal log(wiNVERT) values of approximately 0.75 is much smaller than the logarithm 
of the square of the CiCjbibj terms used (ca 3). Consequently the effective INVERT 
weighting at the maximum in Fig.|6]^d) is larger than that at the maximum in Fig.|6]^c). 
This suggests that the crucial pairwise variance is relatively weakly weighted amongst 
the CiCjbibj. The smallest of these corresponds to the Si/Si pairs, so the implication 
here is that it is the arrangement of Si atoms around each other that is critical in 
defining the structure of a-Si02. 

Having established the effects of the various weightings employed during an 
RMC-I-INVERT refinement, we subsequently explored the role of configuration box 
size. To this end, we compared the outcomes for a-Si refinements based on 
configurations containing 64 and 512 atoms. Variation in box size has a number 
of consequences for the refinement strategy. Clearly the speed of convergence is much 
faster for small boxes. But there are more subtle differences as well. The quantity 
of data used in the fit also differs, for example, because r^ax (x N^^^. It is not 
necessarily straightforward to adjust weightings to compensate fully for this change 
as variation in N also affects the discreteness of the calculated G(r), which in turn 
affects how closely the data can be fitted. Nevertheless we find that the optimal ratio 
of data and INVERT weighting for both 64 and 512 atom systems is roughly similar 
(the logarithmic difference between the two weightings is 2.55 for the former and 3.3 
for the latter). Consequently the shape of the solution space appears to be largely 
unaffected by configuration box size. 

We conclude this section by flagging the existence of a range of interesting near- 
solutions in the case of S12. We first investigated this molecule as an example of a 
small nanostructured system with a single type of atom in two environments: its ring 
structure contains equatorial and axial S atoms that have similar nearest-neighbour 
distances but slightly-different next-nearest-neighbour separations. While the global 
minimum in configurational space does indeed correspond to the correct structure, 
the small molecule size results in a number of distinct local minima with intriguing 
structures (a number of which are shown in Fig. [?]). Clearly the chemistry of these 
candidate structures would be very different! But our purpose in highlighting their 
existence here is to provide a caveat for small-molecule structure solution using PDF 
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Figure 7. Illustrations of molecular geometries associated with local minima in 
the INVERT- weighted configuration space for Si2- The blue and red vertical bars 
represent the corresponding relative magnitudes of, respectively, the XpuviC ^^'^ 
•'(^INVERT terms. The global minimum occurs for the correct geometry, which is 
highlighted on right-hand side. 

data that the solution space may well be densely populated by diverse candidates with 
surprisingly similar PDFs but very different three-dimensional structures. 

5. Outlook and Conclusions 

Arguably one of the most appealing features of the INVERT approach when it was 
first proposed was the apparent transferability of the same simple idea across the very 
different "nanostructure" problems of molecular phases and disordered networks alike. 
Yet perhaps the clearest results of this extended study have been the demonstration 
that INVERT can work very differently in these different systems, and also that the 
balance between favouring local structural invariance on the one hand, and producing 
the best-quality fit to data on the other hand, is not only fine but is also system- 
dependent. Furthermore, we have shown that RMC refinement becomes effectively 
jammed after a relatively small number of accepted moves, such that configurational 
space is not yet effectively explored. This has the important consequence that 
structure solutions may depend on features of the initial configuration; and so we 
are probably not yet in a position to claim that nanostructure "solution" is possible 
at all using PDF methods. 

But we have also tried to show that there are many feasible implementations of 
the INVERT approach — even within the limited definition of pairwise correlations we 
have explored here. It is obvious that the approach is already capable of helping guide 
refinement (Cgo being a particularly successful example), and one obvious avenue for 
further research is to determine more robustly which of the various Xyar formalisms is 
most effective in this respect. It may also prove beneficial to re-evaluate the RMC move 
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strategy: in particular, is it possible to sample configurational space more effectively 
than with the standard algorithm of small individual moves? And, finally, we note the 
suggestion we have made elsewhere ^M" that consideration of higher-order correlation 
functions and local symmetry in an INVERT-type refinement may prove useful avenues 
for further improving the method; this is an area of research we are actively pursuing. 
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